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Abstract 

The  advent  of  affordable  sophisticated  audio  solutions 
for  the  home  PC  market  has  paved  the  way  for  low-cost 
true  three-dimensional  spatial  audio,  which  can  produce  a 
sound  effect  from  any  3D  position  source.  The  home  en¬ 
tertainment  market  has  guided  the  development  of  audio 
solutions  for  PCs  through  hardware  support  for  stereo  and 
Dolby(8)Surround  Sound  versions  4.1,  5.1,  and  7.1.  How¬ 
ever,  these  solutions  provide  sound  panning  only  in  a  pla¬ 
nar  configuration,  typically  around  the  user’s  entertainment 
room.  The  drivers  for  these  home-market  cards  (e.g.  Direct 
Sound)  do  not  have  support  for  non-planar  configurations. 
There  are  other  cards  and  drivers  available  that  provide  true 
3D  spatial  audio,  but  they  are  geared  for  the  professional 
user  market  and  have  steep  price  tags  over  one  thousand 
dollars.  We  explain  how  to  use  inexpensive  consumer-level 
hardware  and  free  software  for  Linux  to  build  a  true  3D  spa¬ 
tial  audio  system. 

1  Introduction 

Many  computer  systems  set  up  for  advanced  gaming  in¬ 
clude  Dolby®Surround  Sound.  The  typical  speaker  con¬ 
figurations  are  4.1  (4  speakers  and  1  subwoofer),  5.1,  and 
7.1.  This  system  is  designed  for  all  speakers  to  be  located 
on  a  plane  centered  at  the  listener,  and  thus  it  is  not  possible 
to  have  a  sound  truly  be  emitted  from  above  or  below  the 
listener-although  some  systems  attempt  to  simulate  that  ef¬ 
fect.  Imagine  a  game  scenario  where  a  monster  is  climbing 
down  a  wall  above  and  behind  the  player,  while  at  the  same 
time,  a  mouse  is  scrambling  across  the  floor  behind  the  lis¬ 
tener.  In  a  planar  surround  system,  the  sound  effects  for 
both  the  monster  and  the  mouse  would  come  from  the  rear 
speakers,  making  it  hard  to  distinguish  the  actual  locations 
of  the  sound  sources.  With  true  3D  spatial  audio,  the  mon¬ 
ster’s  sound  effects  could  be  played  from  speakers  located 
to  the  back-upper-left  and  the  mouse’s  sound  from  speakers 
located  to  the  back-lower-left  and  back-lower-right.  The 
player  will  have  a  much  better  feel  for  what  is  creating  the 
sound  and  where  the  sound  is  coming  from  then  with  tra¬ 
ditional  Dolby(R)Surround  Sound.  Now  the  player  can  arm 
the  rocket  launcher  and  turn  towards  the  back-upper-left  di¬ 
rectly  and  blast  the  monster-no  need  to  aim  towards  the 


harmless  mouse  running  across  the  chamber  floor  behind. 

Spatial  sound  has  been  available  for  several  years  and  is 
primarily  employed  in  immersive  virtual  environments.  The 
systems  are  not  in  mass-scale  production  and  often  must  be 
installed  by  professionals,  making  them  costly  and  out  of 
the  reach  of  most  home  users.  We  have  devised  a  low-cost 
true  3D  spatial  audio  solution  that  requires  only  inexpensive 
consumer-level  hardware  and  open  source  software.  This 
solution  allows  arbitrary  placement  of  speakers,  not  neces¬ 
sarily  co-planar  like  other  systems.  Our  3D  spatial  audio 
solution  is  the  first  that  we  are  aware  to  provide  true  3D 
sound  at  such  a  low  cost.  We  describe  the  hardware  and 
software  we  used  and  how  we  set  up,  configured,  and  tested 
the  3D  spatial  audio  solution. 

2  Background  on  3D  Spatial  Audio 

Preliminary  technology  for  3D  spatial  audio,  Fantasound 
[2],  was  first  developed  for  the  movie  industry  in  the  late 
1930s  by  Disney.  Over  the  years  a  great  deal  of  work  has 
been  done  to  advance  the  field,  especially  by  the  Dolby  cor¬ 
poration.  In  the  last  few  decades,  researchers  enabled  per¬ 
sonal  computers  to  emit  spatial  audio.  Today  spatial  audio 
is  commonplace  in  modern  computer  games.  Home  sys¬ 
tems  typically  use  headphones  or  a  planar  array  of  speakers, 
usually  in  a  preset  configuration,  such  as  Dolby(R)Surround 
Sound  5.1. 

Headphones  present  a  unique  opportunity  to  provide  in¬ 
expensive  3D  audio.  Algorithms  that  use  head-related  trans¬ 
fer  functions  (HRTFs)  [10]  can  create  convincing  3D  spa¬ 
tial  audio  on  headphones  using  a  simple  stereo  sound  card. 
HRTFs  use  data  about  how  sound  is  transformed  by  the 
user’s  body  (especially  the  shape  of  the  ears)  for  mapping 
sounds  with  3D  positional  sources.  The  technique  relies 
heavily  on  applying  different  time  delays  for  each  ear.  Ul¬ 
timately,  we  decided  not  to  use  headphones,  because  we 
needed  a  system  that  scaled  easily  to  many  users.  It  was  far 
more  practical  and  cost  efficient  to  use  speakers. 

There  are  a  number  of  high-cost  professional-grade  hard¬ 
ware  packages  available  (e.g.,  RME  Hammerfall  series  [9], 
M-Audio  Delta  series  [4]  and  Lake  Audio  [3])  that  provide 
true  3D  spatial  audio.  Each  package  has  a  cost  exceeding 
one  thousand  dollars,  boasts  high  sound  quality,  and  has 
a  large  array  of  features  aimed  at  the  professional  market. 
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Although  the  acoustic  quality  of  these  packages  is  undoubt¬ 
edly  higher  than  that  of  our  low-cost  3D  audio  solution  in 
terms  of  audio  clarity  and  fidelity,  both  options  provide  true 
3D  spatial  audio. 

When  we  started  putting  together  a  spatial  audio  sys¬ 
tem,  there  was  no  inexpensive  hardware  and  software 
combination  to  produce  true  3D  spatial  audio.  While 
there  are  software  APIs  that  allow  arbitrary  (not  neces¬ 
sarily  co-planar)  positioning  of  sound  sources  (for  exam¬ 
ple,  Microsoft®  DirectSound®  [5],  the  open  source  com¬ 
munity’s  Advanced  Linux  Sound  Architecture  (ALSA)  [1], 
OSS  [8],  and  OpenAL  [7]),  the  low-level  drivers  only  offi¬ 
cially  support  the  co-planar  4.1,  5.1,  and  7.1  speaker  posi¬ 
tions  mentioned  earlier.  There  is  no  way  to  tell  the  drivers 
that  the  speakers  have  been  moved  to  an  alternate  configura¬ 
tion,  for  example,  with  speakers  above  or  below  the  listener. 
So  even  though  software  developers  could  position  a  sound 
above  or  below  the  user’s  head,  the  low-level  drivers  still 
assumes  the  sound  was  emitted  in  a  circle  around  the  user’s 
head.  The  bulk  of  the  true  3D  spatial  audio  support  comes 
from  customized  APIs. 

Tommi  Ilmonen  at  the  Helsinki  University  of  Technology 
(HUT)  developed  a  3D  spatial  audio  API  called  Mustaju- 
uri  [6]  that  is  built  upon  the  ALSA  drivers.  The  Mustajuuri 
API  implements  Vector  Base  Amplitude  Panning  (VBAP), 
introduced  by  Ville  Pulkki  [12],  as  the  underlying  3D  spa¬ 
tial  audio  model.  In  short,  VBAP  is  the  algorithm  respon¬ 
sible  for  moving  a  sound  across  a  3D  array  of  speakers  and 
making  the  sound  appear  to  come  from  a  specific  direc¬ 
tion.  VBAP  selects  the  three  speakers  closest  to  the  virtual 
sound  position  and  calculates  the  required  volumes  for  each 
speaker.  See  Figure  1  for  an  example  of  how  VBAP  pan¬ 
ning  works.  Mustajuuri  also  simulates  depth  for  audio  by 
using  time  delays  and  distance  attenuation.  This  makes  it 
possible  to  position  a  sound  anywhere  in  space  relative  to 
the  listener.  Mustajuuri  has  already  been  used  to  produce 
3D  spatial  audio  using  high-end  audio  cards,  but  up  until 
now  has  not  supported  low-end  audio  cards. 

3  Hardware  Selection  and  Setup 

The  hardware  needed  to  set  up  a  low-cost  3D  spatial  audio 
system  includes  a  commodity  sound  card  with  certain  fea¬ 
tures,  speakers,  and  audio  cables.  We  describe  our  choices 
for  hardware  components  and  the  steps  needed  to  set  up 
the  hardware.  Throughout  the  discussion,  refer  to  Figure  2 
for  an  illustration  of  the  hardware  interconnections,  speaker 
placement,  and  wiring. 

The  first  thing  to  consider  is  the  number  of  speakers  that 
are  needed  to  produce  3D  spatial  audio  for  a  specific  appli¬ 
cation.  A  minimally  encompassing  setup  produces  sound 
from  all  directions  about  the  user  (left-right,  front-back,  and 
up-down).  The  speakers  can  be  placed  in  any  configura- 


Figure  1:  View  of  3D  spatial  audio  test  case  in  the  immer¬ 
sive  room.  Visual  depictions  show  from  which  speakers  the 
sound  is  coming  for  the  current  view.  For  each  sound,  the 
three  speakers  closest  to  the  virtual  sound  source  are  used 
to  play  the  sound.  Their  volumes  are  varied  based  on  the 
distance  from  the  speaker  and  a  number  of  other  factors. 


tion,  but  setting  up  the  3D  audio  panning  functions  is  not 
as  simple  for  irregular  configurations.  We  decided  to  use 
eight  speakers  in  a  cubic  configuration  (each  speaker  at  a 
vertex  of  the  cube)  as  shown  in  Figure  2.  There  is  nothing 
special  about  the  speakers  needed  for  this  task-the  choice 
is  a  matter  of  budget  and  taste.  We  used  eight  amplified 
commercial-grade  speakers  for  the  simple  reason  that  we 
already  had  them  in  our  lab. 

The  eight  speakers  require  a  sound  card  that  can  produce 
eight  channels  of  audio.  Of  the  low-cost  commodity  audio 
cards,  the  only  candidates  are  the  7.1  cards.  We  chose  the 
Creative  Labs  Audigy  2  card  which  we  found  available  (at 
the  time  of  writing)  for  as  low  as  $90  USD.  Although  it 
is  possible  to  produce  eight  independent  channels  of  audio 
using  more  expensive  sound  cards,  the  Audigy  2  card  is  the 
only  commodity  card  we  are  aware  of  that  has  drivers  in 
place  to  support  what  we  are  doing. 

It  is  important  to  understand  the  outputs  from  the  card. 
In  a  typical  Dolby® Surround  Sound  7.1  speaker  arrange¬ 
ment,  there  are  two  front  speakers  (left  and  right),  two  side 
speakers,  two  rear  speakers,  a  center  speaker  (that  sits  in 
front  above  the  video  screen),  and  a  subwoofer  channel  (the 
“.1”  speaker).  The  Audigy  2  ZS  has  three  analog  output 
jacks  (1/8-inch  mini-phone),  labeled  1,  2,  and  3,  providing 
line-level  outputs  for  the  eight  speakers.  Jack  1  is  three- 
pole,  meaning  it  carries  three  signals-two  signals  drive  the 
front  left  and  right  speakers  and  the  third  is  ground.  Jacks 
2  and  3  are  four-pole,  each  carrying  four  signals.  Jack  2 
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Figure  2:  Audio  Hardware  Setup 


drives  the  rear  speaker  pair  and  a  side  speaker,  while  jack  3 
drives  the  subwoofer,  the  center  speaker,  and  the  remaining 
side  speaker.  One  final  consideration  is  that  these  signals 
are  unamplified  line-level,  so  the  speakers  need  to  be  the 
amplified  type  that  accept  line-level  inputs,  or  a  separate 
amplifier  (or  set  of  amplifiers)  should  be  used  between  the 
sound  card  and  the  speakers. 

The  next  step  is  to  install  the  speakers.  Many  speakers 
designed  for  surround  usage  include  mounts,  but  speaker 
mounts  are  available  commercially  for  a  number  of  other 
speaker  types.  In  our  application,  we  already  had  the  cubic 
infrastructure  in  place  and  used  custom  mounts  to  attach  the 
speakers  to  the  cube.  The  simpler  the  speaker  configuration 
(placing  them  on  a  regular  shape,  using  45-and  90-degree 
angles,  etc),  the  simpler  the  software  configuration  will  be- 
that  process  will  be  explained  in  a  later  section. 

Finally,  the  speakers  must  be  connected  to  the  audio  card. 
How  one  connects  speakers  to  the  Audigy  2  will  depend  on 


the  type  of  speaker  and  amplification  used.  A  trip  to  a  fa¬ 
vorite  electronics  store  should  yield  any  necessary  connec¬ 
tors.  In  our  case,  our  speakers  each  have  a  two-pole  1/4-inch 
phone  jack,  so  we  needed  to  split  the  three  combined  out¬ 
puts  of  the  sound  card  into  eight  separate  signals.  For  Jack 
1,  we  used  an  easily-available  1/8-inch-stereo-to-dual-RCA 
adapter.  For  Jacks  2  and  3,  we  found  a  similar  adapters 
with  four  poles  and  three  RCA  connectors.  These  adapters 
are  most  commonly  used  with  camcorders  (the  three  signals 
used  for  composite  video  and  stereo  audio).  These  adaptors 
gave  us  eight  separate  RCA  connectors,  and  after  obtaining 
eight  long  RCA-to- 1/4-inch-mono  cables,  we  were  set. 

In  our  final  configuration,  we  used  an  Alesis  Studio  32 
mixer  board.  This  device  fits  inline  between  the  audio 
card’s  outputs  and  speakers’  inputs  and  allows  fine  tuning 
of  the  volume  levels.  Although  the  board  made  it  a  little 
easier  to  test  and  tune  the  audio,  it  wasn’t  truly  necessary  as 
the  same  adjustments  can  be  made  in  software. 


4  Software  Selection  and  Setup 

The  software  solution  for  low-cost  3D  spatial  audio  is  best 
described  by  the  layered  hierarchy  shown  in  Figure  3.  The 
software  layers  required  to  interface  with  the  sound  cards 
include  low-level  audio  drivers  and  a  3D  spatial  audio  API. 
We  focused  our  primary  development  efforts  under  Linux 
because  of  easy  access  to  the  source  code  for  low-level  au¬ 
dio  drivers  and  the  overall  support  community  for  develop¬ 
ers  working  on  projects  like  this  one. 
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For  the  driver  layer,  we  chose  ALSA,  which  was  men¬ 
tioned  previously.  ALSA  provides  audio  and  Musical  In¬ 
strument  Digital  Interface  (MIDI)  functionality  to  the  Linux 
operating  system  and  supports  many  types  of  audio  hard¬ 
ware  ranging  from  consumer  sound  cards  to  professional 
multichannel  audio  interfaces.  We  selected  ALSA  since 
it  appeared  to  require  the  least  effort  to  generate  the  eight 
channels  we  needed  for  3D  spatial  audio.  Until  we  mod¬ 
ified  the  ALSA  driver  to  access  all  eight  channels,  it  only 
supported  6  channels  (5.1)  on  the  Audigy  2.  These  changes 
have  been  incorporated  into  ALSA,  but  may  or  may  not  be 
in  a  release  version  at  publication  time.  In  that  case,  one 
can  get  the  latest  source  and  build  it-be  sure  to  include  the 
“emulOkl”  sound  card  argument  when  using  the  “./config¬ 
ure”  script  so  that  the  ALSA  driver  will  recognize  the  Au¬ 
digy  card. 

After  the  driver  is  set  up,  the  3D  spatial  sound  API  can 
be  installed.  It  will  distribute  sound  effects  from  a  given  3D 
position  to  the  appropriate  audio  channels.  Although  there 
are  quite  a  few  APIs  to  choose  from,  we  chose  Mustajuuri, 
mentioned  in  the  background  section.  The  Mustajuuri  soft¬ 
ware  works  with  ALSA  and  provides  3D  panning  over  an 
arbitrary  array  of  speakers  using  the  VBAP  algorithm  as  de¬ 
scribed  previously.  The  Mustajuuri  API  provides  all  of  the 
features  needed  for  a  basic  3D  positional  sound  system  and 
is  fairly  easy  to  extend.  Over  the  course  of  this  project,  we 
made  several  minor  source  code  modifications  and  they  are 
included  in  the  October  2004  release. 

Mustajuuri  does  its  magic  via  a  module  called  the  Mixer, 


which  mixes  multiple  sound  sources  (sound  files,  mi¬ 
crophone  inputs,  or  other  sources)  into  individual  audio 
streams.  These  streams  are  then  piped  into  a  panning  mod¬ 
ule,  which  is  responsible  for  routing  each  input  signal  to  the 
appropriate  speakers,  setting  the  correct  gain  and  time  delay 
at  each  speaker,  and  mixing  multiple  streams  meant  for  the 
same  speaker  into  a  single  stream  to  be  sent  to  that  speaker. 
It  does  the  routing  and  gain  calculations  based  on  VBAP, 
and  some  additional  gain  and  delay  calculations  based  on 
distance.  The  result  is  that  each  incoming  sound  source  to 
the  panning  module  leaves  to  a  set  of  three  speakers,  and  the 
resulting  sound  appears  to  come  from  a  specific  3D  position 
in  space.  Doppler  shifting  is  also  simulated. 

Once  Mustajuuri  is  compiled  and  installed,  there  are  sev¬ 
eral  tasks  that  must  be  performed  to  configure  the  software 
to  work  with  the  given  3D  speaker  array. 

Task  1:  Configuring  ALSA 

ALSA  needs  to  know  how  to  communicate  with  all  eight 
channels  of  the  audio  card.  This  would  normally  be 
achieved  by  simply  using  the  device  named  “surround71”, 
however  it  is  not  fully  compatible  with  the  spatial  sound 
API  Mustajuuri.  Mustajuuri  requires  support  for  input 
channels.  The  device  “surround71”  supports  eight  output 
channels,  but  not  any  input  channels.  Therefore  it  is  neces¬ 
sary  to  define  a  new  device  that  has  eight  output  channels 
and  some  input  channels. 

In  order  to  meet  this  requirement,  an  “asymmetric”  de¬ 
vice  is  defined.  The  device  is  called  “asymmetric”  because 
the  number  of  input  and  output  channels  are  not  necessar¬ 
ily  the  same.  Note  that  the  number  of  input  channels  are 
not  explicitly  stated.  ALSA  determines  the  number  of  input 
channels  automatically  and  assigns  the  maximum  (Audigy 
card  has  two). 

To  configure  ALSA,  add  the  following  text  into  the  file 
“/etc/asound.conf"  (or  create  the  file  if  necessary).  This  file 
holds  information  about  user  defined  devices,  and  so  we 
use  the  following  text  to  add  an  asymmetric  device  called 
“eightout”. 

ctl.eightout  { 
type  hw 
card  0 

} 

pcm. eightout  { 
type  asym 
playback. pcm  { 
type  route 
slave. pcm  surround71 
ttable.0.0  1 
ttable .1.1  1 


ttable .2.2  1 
ttable.3.3  1 
ttable .4.4  1 
ttable. 5. 5  1 
ttable .6.6  1 
ttable .7.7  1 

} 

capture. pcm  { 
type  hw 
card  0 

} 

} 

Next,  an  environment  variable  must  be  set  to  allow  Musta- 
juuri  to  talk  to  the  audio  card  through  ALSA.  Set  the  fol¬ 
lowing  environment  variable: 

export  MJ_AUDIO_CONF= 

" Input=2=hw : 0 , 0  |  0utput=8=eightout " 

Once  this  is  done,  Mustajuuri  should  be  able  to  output  audio 
through  all  eight  channels  of  the  audio  card. 

Task  2:  Configuring  the  Mustajuuri  Mixer  Panel 

Mustajuuri  uses  a  mixer-board  style  GUI  for  sending  input 
audio  streams  to  a  speaker  array,  combining  them  or  just 
passing  them  through  intact.  The  input  streams  can  either 
come  from  sound  files  or  from  live  sources  (such  as  a  mi¬ 
crophone).  The  GUI  lays  out  several  strips  of  channels  that 
can  be  assigned  different  functions  that  are  applied  in  a  se¬ 
quential  process.  Some  example  functions  are  input,  send 
(to  speaker),  amplitude  gain,  panning  and  synthesizer.  The 
gain  and  panning  modify  how  the  audio  is  distributed  to  in¬ 
dividual  output  audio  channels. 

The  Mixer  Panel  configuration  we  use  can  be  seen  in  Fig¬ 
ure  4,  which  uses  two  mixer  strips.  The  first  has  two  in¬ 
teresting  channels:  a  synthesizer  channel,  which  manages 
the  sound  files,  and  a  panning  module,  which  handles  the 
VBAP-based  panning  across  speakers.  The  second  strip  is 
used  to  manage  remote  connections  from  external  applica¬ 
tions  and  does  not  accept  an  audio  stream  as  input.  It  sends 
commands  to  the  synthesizer  and  the  VBAP  module. 

To  create  a  similar  configuration,  launch  Mustajuuri  and 
create  a  new  mixer  from  the  “File”  menu.  This  mixer  will 
have  several  strips  already,  and  all  of  these  strips  will  be 
essentially  blank.  The  number  of  strips  and  the  number  of 
modules  per  strip  can  be  changed  using  the  “Edit”  menu 
if  needed.  Modules  can  be  assigned  by  clicking  with  the 
mouse  on  a  particular  slot.  To  adjust  the  module’s  proper¬ 
ties,  just  click  on  the  blue  link  defining  the  module’s  type 
(i.e.  “Synthl”,  “Mixer  Input”,  etc.).  The  “Strip  X”  button 
at  the  top  of  a  strip  can  be  used  to  modify  and  remove  the 
modules  in  any  slot  in  that  strip.  All  mixer  configuration 
changes  are  saved  by  using  the  save  options  from  the  “File” 


Figure  4:  Screenshot  of  the  Mustajuuri  Mixer  GUI  (Tommi 
Ilmonen  [6]) 


menu.  The  resulting  configuration  file  (e.g.,  SpatialAu- 
dio.mj)  is  specified  on  the  command  line  (e.g.,  “./Musta¬ 
juuri  SpatialAudio.mj”)  when  Mustajuuri  is  called. 

Task  3:  Specifying  Speaker  Placement 

In  order  to  use  VBAP,  it  is  necessary  for  Mustajuuri  to  know 
the  locations  of  the  speakers  in  the  3D  array.  Mustajuuri 
does  this  through  a  configuration  file  that  is  specified  as  part 
of  the  VBAP  panning  module  setup  (this  module  was  cre¬ 
ated  as  part  of  Task  2).  This  file  specifies  the  azimuth  and 
elevation  angles  (in  degrees)  for  each  speaker  relative  to  the 
listener.  Since  our  system  uses  eight  speakers  arranged  in 
a  cube  configuration,  our  configuration  file  is  specified  as 
follows: 

3  #  dimensionality 

#  Azimuth,  followed  by  elevation. 

#  E.g.,0  0  would  be  straight  ahead. 

-45  45  #  Front  upper  left 

45  45  #  Front  upper  right 

-135  45  #  Back  upper  left 

135  45  #  Back  upper  right 


-45  -45  #  Front  lower  left 

45  -45  #  Front  lower  right 

-135  -45  #  Back  lower  left 

135  -45  #  Back  lower  right 

This  configuration  file  is  used  by  the  main  configuration 
file  for  the  Mustajuuri  Mixer.  It  should  be  noted  that  this 
file  assumes  that  all  speakers  are  equidistant  from  the  lis¬ 
tener.  If  this  is  not  the  case,  it  will  be  necessary  to  adjust 
the  gain  and  delay  for  each  speaker  (using  the  Mustajuuri 
Mixer)  manually.  In  our  system,  this  was  not  necessary  and 
it  would  involve  significant  work  if  it  were  necessary.  The 
easiest  solution  is  to  try  to  place  all  speakers  equidistant 
from  the  ideal  listening  position  (i.e.  the  center  of  the  envi¬ 
ronment). 

Task  4:  Configuring  Sound  File  Loader 

In  order  to  use  a  sound  file  from  Mustajuuri,  that  sound  file 
must  be  known  to  Mustajuuri  at  load  time.  The  mechanism 
used  to  do  that  is  a  configuration  file  that  specifies  all  sound 
files  that  might  possibly  be  used  by  Mustajuuri.  This  con¬ 
figuration  file  is  used  by  the  synthesizer  mixer  module  (cre¬ 
ated  in  Task  2).  A  sample  configuration  file  that  loads  three 
sound  files  follows.  Once  a  file  is  created,  any  name  can  be 
assigned,  and  make  sure  the  synthesizer  module  points  to 
that  file. 

unusevoices  *-stk 
polyphony  48 

sample  audioeffectl.wav 
sample  audioeffect2.wav 
sample  sudioeffect3.wav 

The  polyphony  line  specifies  the  maximum  number  of  au¬ 
dio  files  that  will  be  loaded  by  Mustajuuri  (so  it  should  be 
at  least  as  large  as  the  number  of  audio  files  listed  in  this 
file).  The  last  three  lines  specify  three  sample  audio  WAV 
files.  Any  audio  files  specified  here  must  be  placed  in  a  di¬ 
rectory  that  is  specified  separately  as  part  of  the  synthesizer 
module  configuration.  Note  that  the  “unusevoices”  line  is  a 
somewhat  more  advanced  setting,  but  should  help  improve 
efficiency  somewhat. 

Task  5:  Configuring  Mustajuuri  for  Remote  Control 

Mustajuuri  is  designed  to  act  as  a  standalone  program  to 
manipulate  audio,  not  as  a  library  to  be  linked  against  with 
another  application.  In  order  to  control  Mustajuuri  from  an¬ 
other  application,  as  is  the  case  with  our  project,  two  steps 
are  required.  The  first  involves  setting  up  Mustajuuri  to  lis¬ 
ten  for  control  commands  over  the  network  (this  task).  The 
second  includes  writing  a  simple  API  in  the  main  applica¬ 
tion  to  talk  to  Mustajuuri  (the  next  task). 


To  get  Mustajuuri  to  accept  commands  from  the  network, 
a  network  module  must  be  loaded.  This  network  module  is 
the  only  way  for  an  external  application  to  control  Mustaju¬ 
uri,  even  if  both  the  application  and  Mustajuuri  are  running 
on  the  same  machine.  Adding  this  module  is  very  simple 
and  only  requires  configuring  which  port  Mustajuuri  will 
listen  to  (the  default  port  is  10030).  This  module,  which 
should  have  been  created  in  Task  2  above,  will  automat¬ 
ically  communicate  with  the  synthesizer  module  and  the 
VBAP  panning  modules,  if  they  are  in  the  mixer. 

Task  6:  Interfacing  with  Mustajuuri  API 

In  order  to  control  Mustajuuri  from  an  application,  it  is  nec¬ 
essary  to  add  code  included  with  the  Mustajuuri  API  into 
the  application.  We  present  example  code  segments  show¬ 
ing  how  to  connect  to  a  remote  audio  server,  play  audio 
specified  coming  from  a  given  3D  position,  and  change  the 
position  of  a  sound  source  and  listener  position. 

The  first  code  segment  shows  the  initial  commands  to 
connect  the  application  to  the  remote  Mustajuuri  audio 
server  and  initialize  it.  If  Mustajuuri  is  running  on  another 
machine,  change  the  address  to  reflect  this.  To  change  the 
default  port  (10030)  that  Mustajuuri  listens  to,  this  is  ac¬ 
complished  by  specifying  the  new  port  in  the  address  string 
(e.g. ,  “mj server.mydomain.com:  12345”). 

The  two  objects  that  we  created,  one  instance  each  of 
AC  Control  and  AC  VrContml,  will  be  used  later  to  send 
commands  to  Mustajuuri. 

//  connecting  to  a  remote  server 

#include  <ac_vr_control . h> 

AC_Control  acControl  = 
new  AC_Control ( )  ; 

char*  m j ServerAddress  =  "127.0.0.1"; 

if ( ! acControl->init (mj ServerAddress ) ) 

{ 

//  error  handling  code  here 

} 

AC_VrControl  acVrControl  = 

new  AC_VrControl (  acControl  ) ; 

The  next  code  segment  shows  how  to  specify  the  posi¬ 
tion  of  the  source  of  the  audio  and  play  it.  One  interesting 
thing  to  note  here  is  that  the  variable  “outputChannel”  iden¬ 
tifies  the  intended  sound  source  to  work  with.  The  number 
of  supported  sound  sources  was  specified  in  the  synthesizer 
module  from  Task  2,  and  “outputChannel”  should  be  be¬ 
tween  0  and  the  number  of  sources  minus  one.  The  variable 
“soundFilename”  should  not  have  a  path  as  part  of  the  file¬ 
name.  The  filename  should  be  one  of  the  files  listed  in  the 
configuration  file  created  as  part  of  Task  4  (the  sound  file 


loader  configuration  file).  Lastly,  the  soundLevel  is  essen¬ 
tially  the  initial  gain  level  for  the  new  sound.  This  will  need 
to  be  experimented  with  to  find  an  appropriate  setting. 

AC_Vector3  location! 
positionX, 
positionY, 
positionZ) ; 

int  soundld  = 

acVrControl->playSample ( 
output Channel, 
soundFilename, 
soundLevel , 
location, 
true, 

0,  0)  ; 

The  last  code  segment  shows  how  to  reposition  the  sound 
source  location  and  the  orientation  and  position  of  the  lis¬ 
tener.  The  “outputChannel”  variable  refers  to  the  sound 
source  that  is  desired  to  be  moved,  and  will  be  the  same 
value  that  was  used  to  call  “playSample”  from  the  previous 
example.  The  “listenerRotation”  matrix  specifies  the  orien¬ 
tation  of  the  listener  relative  to  the  world,  and  the  “worldRo- 
tation”  matrix  specifies  the  orientation  of  the  world  relative 
to  the  speakers. 

//  reposition  a  source  of  sound 

AC_Vector3  location! 
positionX, 
positionY, 
positionZ)  ; 

acVrControl->moveSource (outputChannel , 
0.05,  location) ; 

//  reposition  the  listener  orientation 

AC_Matrix3  listenerRotation! 

. . .  listener  rotation  matrix  ...  ) ; 

AC_Matrix3  worldRotation ( 

. . .  world  rotation  matrix  ...  ) ; 

acVrControl->set Trans format ions ( 
location,  listenerRotation, 
worldRotation,  0.05); 

5  Hardware  Testing  and  Calibration 

We  tested  the  hardware  design  of  our  3D  spatial  audio  sys¬ 
tem  by  integrating  the  hardware  with  our  4-wall  immersive 
virtual  reality  room  at  the  Virtual  Reality  Laboratory,  part 
of  the  Naval  Research  Laboratory  in  Washington  D.C.  We 
arranged  the  speakers  in  a  cube  array  and  placed  them  at 


Room 

Figure  5:  Immersive  room  depiction  showing  placement  of 
speakers  in  a  cube  array  and  audio  coming  from  the  user’s 
front-upper-right  direction  (lines  are  colored  red). 


the  corners  of  the  immersive  roon  as  shown  in  Figure  5.  We 
designated  a  1.2  GHz  Redhat  Linux  machine  as  an  audio 
server  and  installed  in  it  an  Audigy  2  ZS  card.  We  con¬ 
nected  the  speakers  using  the  cabling  described  before,  and 
tried  the  system  both  with  and  without  the  mixing  board 
mentioned  earlier. 

CPU  utilization  on  this  machine  while  Mustajuuri  was 
running  with  3  audio  sources  in  motion  was  generally  less 
than  20%,  and  the  memory  usage  was  negligible.  Fur¬ 
ther  savings  could  of  course  be  realized  by  using  optimized 
compiler  settings,  rather  than  the  debug  settings  we  used. 

We  tested  the  outputs  from  each  speaker  to  determine  the 
range  of  intensities  that  could  be  played  on  each  channel. 
We  listened  to  each  speaker  individually  for  sound  quality, 
sound  balance,  and  percussive  resonance.  The  easiest  as¬ 
pect  to  listen  for  is  the  sound  balance  between  treble  and 
bass.  If  one  of  the  two  is  obviously  higher  than  the  other, 
adjust  the  necessary  frequency  filters  as  needed.  For  ex¬ 
ample,  if  there  is  too  much  bass,  decrease  the  bass  and/or 
increase  the  treble.  If  there  seems  to  be  excessive  low  end 
or  high  end  noise,  adding  a  low-pass  or  high-pass  filter  may 
be  necessary. 

Another  easy  aspect  to  listen  for  is  speaker  distortion. 
Simply  put,  if  the  speakers  are  so  loud  that  the  sound  pro¬ 
duced  is  bad,  lower  the  volume  of  the  speaker.  If  a  given  set 
of  speakers  will  not  produce  quality  sound  at  an  acceptable 
volume,  it  may  be  necessary  to  get  more  powerful  speakers. 

One  of  the  hardest  aspects  to  listen  for  is  the  resonance 
of  percussive  sounds  generated  by  a  speaker.  This  qual¬ 
ity  is  basically  how  much  the  sound  echoes  from  where  the 
speaker  is  located.  Adjustments  have  to  be  made  if  it  sounds 
like  a  speaker  is  reverberating  with  percussive  sounds.  De¬ 
pending  on  the  quality  of  the  speaker,  and  the  quality  of  the 
mixer  board,  this  problem  may  be  corrected  to  some  degree 
by  continuing  to  filter  the  signal.  For  excessively  bad  cases, 
hard  objects  such  as  exposed  metal,  concrete,  hard  plastic. 


and  even  glass  should  be  covered  with  a  sound  dampening 
material  like  cloth  or  foam. 

Once  each  speaker  is  calibrated,  the  entire  setup  has  to 
be  balanced.  It  can  either  be  done  with  devices  designed  to 
measure  acoustic  levels  or  by  listening  to  the  speaker  from 
the  pre-determined  center  of  the  3D  speaker  array.  Either 
way,  the  gain  of  each  speaker  should  be  adjusted  until  the 
same  audio  intensity  level  is  received  from  each  channel. 
Keep  in  mind  that  the  outputs  for  each  channel  of  the  audio 
card  were  customized  by  the  manufacturer  for  the  inten¬ 
sity  requirements  for  each  type  of  speaker  (satelite,  center, 
and  subwoofer)  normally  attached  in  a  surround  configura¬ 
tion,  and  the  intensity  output  for  each  type  differs.  There 
are  many  published  methods  for  dealing  with  this  problem, 
but  we  went  with  the  low-tech  solution  of  having  some¬ 
one  stand  and  listen  at  the  center  of  our  speaker  array.  We 
set  the  software  control  to  maximum  gain  and  adjusted  the 
mixer  board  based  on  feedback  from  the  listener.  Remem¬ 
ber  that  these  changes  can  be  made  in  software  with  the 
ALSA  drivers  and  Mustajuuri  if  a  mixing  board  is  not  avail¬ 
able. 

6  Software  Testing 

We  tested  the  software  by  integrating  sound  into  an  existing 
in-house  simulation  platform,  BARS-Utopia  [11],  that  oper¬ 
ates  on  a  Linux  visualization  cluster  (ORAD  Incorporated, 
orad-ny.com)  which  drives  our  immersive  room.  BARS- 
Utopia  supports  several  virtual  world  databases,  interaction 
methods,  and  spatial  audio.  However,  no  support  was  avail¬ 
able  for  interacting  with  the  Mustajuuri  API  in  particular,  so 
we  implemented  a  plugin  to  bridge  the  BARS-Utopia  spa¬ 
tial  audio  support  with  Mustajuuri.  BARS-Utopia  already 
contains  all  of  the  information  needed  by  Mustajuuri,  such 
as  sound  source  positions,  listener  position  and  orientation, 
and  sound  source  creating/deletion  notifications-the  plugin 
simply  translates  that  data  into  a  form  that  Mustajuuri  un¬ 
derstands. 

When  the  plugin  was  completed,  we  tested  and  debugged 
the  new  system.  The  primary  software  adjustments  we 
made  were  to  the  attenuation  level  of  the  audio  channel  out¬ 
puts.  Mustajuuri  uses  a  simple  attenuation  model  and  re¬ 
quires  some  manual  tweaking  for  the  expected  environment 
(i.e.  outdoor,  indoor,  time  of  year,  etc.).  In  the  real  world, 
sound  attenuation  rates  are  quite  complicated  and  are  in¬ 
fluenced  by  factors  such  as  temperature,  humidity,  and  the 
frequency  makeup  of  the  sound. 

We  tested  the  sound  system  by  implementing  several  sce¬ 
narios,  each  with  a  different  scene  dataset  and  different  au¬ 
dio  effects  attached  to  an  animated  object.  Before  the  audio 
objects  were  animated,  we  evaluated  several  volume  levels 
and  several  distances  away  for  each  object.  Figure  1  shows 
a  simple  scenario  we  designed  and  tested  the  sound  effects 


of  a  car.  When  we  finished  testing  the  volume  and  distance 
effects,  we  generated  an  animated  path  for  the  car  to  follow. 

Figure  6  shows  a  more  complex  scenario  with  three  audio 
sources  (tank,  jet,  and  helicopter) — the  jet  is  off  the  screen 
and  not  shown  in  view.  We  performed  some  simple  tests 
to  see  how  many  sound  sources  interacted  together.  It  was 
of  primary  importance  that  the  jet  (typically  far  away)  not 
sound  too  quiet  and  the  tank  and  helicopter  (typically  closer 
to  the  camera)  not  dominate  the  aureal  bandwidth,  so  some 
minor  tweaking  resulted  for  both  the  far  and  near  objects’ 
attenuation  parameters. 

7  System  Validation 

After  all  of  the  testing  and  calibration  was  completed,  we 
performed  two  informal,  qualitative  user  tests  that  would 
help  us  validate  our  new  low-cost  spatial  audio  system.  The 
first  test  evaluated  how  the  new  sound  system  configuration 
with  eight  speakers  compared  with  our  previous  planar  con¬ 
figuration  containing  four  speakers.  The  prior  configuration 
simply  used  the  four  speakers  on  the  top  of  the  cube  array. 
We  realize  that  directly  comparing  these  two  configurations 
is  somewhat  biased,  due  to  the  placement  of  the  4-speaker 
array  being  located  above  the  user’s  head-it  would  be  fairer 
to  compare  against  a  4-speaker  array  located  at  the  height 
of  the  user.  However,  by  using  the  top  four  speakers,  we 
were  able  to  switch  between  the  two  configurations  without 
dismantling  our  installation. 

We  performed  the  experiment  by  asking  a  few  test  sub¬ 
jects  to  stand  in  the  middle  of  the  immersive  room  and  listen 
to  sounds  played  for  each  configuration.  We  played  differ¬ 
ent  sequences  of  audio  on  both  speaker  configurations  and 
made  use  of  the  full  range  of  speakers  available.  The  sub¬ 
jects  were  not  told  which  configurations  were  being  used, 
nor  in  which  order  the  pairs  of  configurations  were  pre¬ 
sented.  Several  iterations  of  the  pairs  of  configurations  were 
tried  for  each  subject.  After  each  pair  was  presented,  the 
subjects  rated  the  two  systems.  Admittedly,  this  was  not  a 
scientific  test  as  is  evidenced  by  several  unaddressed  biases, 
but  all  test  subjects  clearly  preferred  the  8-speaker  configu¬ 
ration. 

The  second  user  test  evaluated  how  well  the  listener  is 
able  localize  the  source  of  the  audio  using  the  8-speaker 
configuration.  Again,  the  subjects  were  tested  and  each 
were  asked  to  stand  in  the  center  of  the  immersive  room. 
Each  subject  was  presented  with  several  sounds  played  one 
at  a  time  and  originating  from  different  positions  surround¬ 
ing  the  subject.  The  subjects  were  asked  to  point  in  the 
direction  of  the  sound  source,  as  they  heard  it.  The  visual 
system  was  not  running,  so  the  users  did  not  get  visual  cues 
as  to  the  sound  source’s  location.  The  subjects  were  able 
to  localize  the  sounds  with  a  high  degree  of  accuracy,  espe¬ 
cially  with  respect  to  elevation. 


Figure  6:  Shows  a  user  interacting  with  a  scene  with  multiple  sound  sources  (a  tank,  helicopter  and  a  jet  (not  visible)).  The 
nearest  speakers  are  determined  for  each  sound  source  and  the  output  is  mixed  at  each  speaker  (if  overlapping). 


The  implementation  of  our  3D  spatial  audio  system  inte¬ 
grated  with  our  immersive  room  really  enhanced  the  simu¬ 
lation  and  training  demos  we  have.  Our  completed  system 
has  dramatically  improved  the  sense  of  immersion  when 
running  the  demos.  A  simulation  user  easily  perceives  heli¬ 
copters  and  jets  flying  overhead  and  a  tank  rumbling  down 
one  of  the  many  streets  nearby  in  the  virtual  world.  The  per¬ 
ception  of  depth  from  the  source  of  audio  is  conveyed  very 
accurately  and  also  includes  doppler  effects.  Our  system 
is  a  step  above  a  4-speaker  solution  we  had  previously  us¬ 
ing  the  M i  c r o s o f t @  D  i  r c c t S o u  n d (R)  A  PI .  It  was  also  a  good 
replacement  for  the  very  capable  but  outdated  and  unsup¬ 
ported  8-speaker  solution  we  had  running  using  another 
very  expensive  hardware  and  software  platform. 

8  Conclusions  and  Future  Work 

We  have  devised  a  true  3D  spatial  audio  solution  that  is 
low  cost  and  has  comparable  quality  to  high-end  expensive 


commercial  systems.  The  3D  spatial  audio  solution  allows 
sound  effects  to  be  generated  from  all  directions  surround¬ 
ing  a  user,  not  only  planar  directions.  We  accomplished  this 
feat  by  using  only  commodity  hardware  and  open  source 
software.  We  feel  this  feature,  now  available  at  an  afford¬ 
able  price,  will  create  numerous  options  for  game  and  vir¬ 
tual  reality  system  developers. 

We  feel  our  system  will  lead  the  way  for  others  to  de¬ 
vise  similar  solutions  with  current  and  future  commodity 
audio  equipment.  The  developer  needs  only  to  purchase  a 
Dolby(R)Surround  Sound  7. 1  audio  card,  four  pairs  of  low- 
cost  speakers,  and  audio  cables.  We  spent  less  than  $150 
USD  on  hardware  (Audigy  2  audio  card  and  audio  cables) 
since  we  already  had  speakers  available.  From  start  to  fin¬ 
ish,  including  hardware  and  software  debugging,  configur¬ 
ing  and  testing,  we  spent  less  than  a  month  developing  the 
low-cost  3D  spatial  audio  system.  We  feel  that  using  this 
document  as  a  guide,  it  should  be  possible  to  implement 


this  system  in  less  than  a  week.  Obviously,  if  a  currently 
unsupported  audio  card  is  used,  there  will  be  time  and  ex¬ 
pertise  needed  to  write  a  driver  for  it. 

Although  the  system  currently  meets  our  needs  quite 
nicely,  these  features  would  be  nice  to  add  to  the  3D  spa¬ 
tial  sound  API  in  the  future: 

•  Directional  Sound  Cones:  Directional  sound  cones 
are  a  mechanism  to  provide  directional  sound  with  the 
strongest  intensities  propagated  along  the  central  axis 
of  the  cone  and  weakest  towards  the  edges.  Since 
many  sound  sources  are  directional  by  nature  (e.g., 
sound  emanated  from  a  megaphone),  directional  sound 
cones  would  allow  these  sound  sources  to  be  more  ac¬ 
curately  generated.  Also,  since  some  major  APIs  (such 
as  Microsoft(g)DirectSound®)  offer  sound  cones,  it 
would  be  nice  to  offer  such  a  feature. 

•  Additional  Environmental  Reverberation  Effects: 

Although  there  are  a  number  of  very  simple  environ¬ 
mental  reverberation  effects  available  in  the  common 
3D  spatial  audio  APIs,  supporting  more  sophisticated 
effects  such  as  sound  reflection  and  absorption  off  of 
different  surfaces  would  greatly  enhance  the  listening 
experience.  This  is  an  area  of  ongoing  research,  and 
the  Mustajuuri  system  would  be  a  good  testbed  for  try¬ 
ing  out  new  techniques. 

•  Enhanced  Sound  Attenuation  Level:  The  current 
distance  attenuation  models  for  sound  in  Mustajuuri 
are  quadratic  by  nature,  thus  are  fairly  simple.  In  the 
real  world,  sound  attenuation  is  much  more  sophisti¬ 
cated  and  depends  on  heat,  humidity,  sound  frequency, 
and  many  other  factors.  For  example,  the  low  fre¬ 
quency  sounds  generally  carry  much  farther  than  high 
frequency.  Accounting  for  these  complexities  could 
help  significantly  in  providing  distance  cues. 
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